Methods for Increasing Instruction-Level Parallelism 



Claims 

1. » A^rocc SsSr d c vicc u m npiising " 



an instruction stream transformation unit and 
an instruction stream cache 

wherein said instruction stream transformation unit transforms code blocks of instructions from 
an original instruction set architecture to a transformed instruction set architecture. 

2. A processor device as in claim 1 further comprising an execute umt 

that can directly execute instructions in both said original instruction set architecture and said 
transformed instruction set architecture. 

3. A processor device as in claim 2 wherein said execjdte unit is an in-order execute unit. 

4. A processor device as in claim 2 wtferein sai^T execute unit is a dynamic out-of-order execute 
unit. 



5. A processor device as in claim 1 ^jrther yot 
instruction stream transformation unit 



lg a working memory connected to said 



6. A processor device as in /Claim 1 wherein said instruction stream transformation unit 
transforms blocks of code winch are presumed hyper-blocks. 

7. A processor device/as in claim 1 wherein said instruction stream transformation unit 
transforms blocks of/code which are denoted as hyper-blocks in original instruction set 
architecture code. 

8. A processor device as in claim 1 wherein said instruction stream cache comprises means of 
using a tag for each cache line to denote whether the tag is a start of a hyper-block. 

9. A processor device as in claim 1 wherein said instruction stream cache comprises means of 
using/a hyper-block ID as an alternative way of addressing a transformed block of code. 
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10. - A proces s m- - dovic c>--as-i a claim 9 wh erein said instfUCtion"^trcam- €ache comprises m e 
storing a plurality of hyper-block lines that are chained together using a common hyper-Mtfck ID 
plus pointers. 

11. A processor device as in claim 1 wherein said instruction stream cach^only hits when a 
transformed block of code is entered starting from the first instruction ofjtfe block. 

12. A processor device as in claim 1 wherein said instruction stream transformation unit 
comprises means of instruction re-ordering. 

13. A processor device as in claim L^herem said ^instruction stream transformation unit 
comprises means of performing predication if-Vopver^fon. 

14. A processor device as in claim 1 wherem said instruction stream transformation unit 



comprises means of converting a lc 
instruction pair. 



ad instruction into a speculative load and a load activation 



15. A processor device as in clamy 1 where 
comprises a parallel dependency detector circuit, 



n said instruction stream transformation unit 



16. A processor device as ir/ claim M wherein said instmction^Strgam transformation unit 
processes groups of instructions within an instruction window. 

17. A processor device as in claim 16 wherein said instruction stream transformation unit 
processes groups of instructions using overlapping instruction windows. 

18. A processo/ device as in claim 1 wherein said instruction stream transformation unit 
comprises me^ris of creating a dependency matrix to represent potential dependencies between 
instructions. 

19. A Processor device as in claim 18 wherein said instruction stream transformation unit 
comprises means of creating an operand mapping table to represent potential writes of operands 
by jnstructions. 
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20. ^j ^nrs ressofj i e y ^^ ^wherenr-'sa^ stre a m - tran sfo rmation - un# 
comprises means of performing register renaming. 

21. A processor device as in claim 20 wherein said register renaming by the instructimi stream 
transformation unit allocates a set of physical registers that are separate from a ^econd set of 
registers allocated by dynamic register renaming done by the execute unit. 

22. A processor device as in claim 1 wherein the instruction stream transformation unit 
comprises means of performing an instruction scheduling algorithm. 

23. A processor device as in claim 22 wherein said schedul^ algorithm is a list scheduling 
algorithm. 

24. A processor device as in claim 1 wWerein said execute unit comprises means of performing 



dynamic memory disambiguation and 



notations for indicating ambiguous memory operations. 



25. A processor device as in claim 
comprises means of creating an ambij 



said trar 



led instruction set architecture supports 



wherefo sauTlnstxuction stream transformation unit 
Jus men/ory dependency matrix. 



26. A processor device as in /tfaim 24 wherein said instruction stream transformation unit 
comprises means of converting an ambiguous memory read instruction into a speculative read 
and a read check instruction pair. 

27. A processor device as in claim 1 comprising a dynamically-scheduled execute unit. 

28. A processor/device as in claim 1 wherein the transformed instruction set architecture 
comprises dependency notation means of explicitly describing dependencies between 
instructions 



29. Vprocessor device as in claim 28 wherein said means of explicitly describing dependency 
notation means allows instructions to be grouped into mini-tuples for dependency notations. 
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30. vUprtfcessor deviceaSTn claTHr^-wfa efdn said depend en cie s- arc rcprc s cnto cl by - depe n deuc> 
pointers in the transformed instruction set architecture. 

31. A processor device as in claim 28 wherein said dependencies are represented by dpffendency 
vectors in the transformed instruction set architecture. 

32. A processor device as in claim 1 further comprising means to jDprform semi-dynamic 
instruction code re-writing and re-scheduling. 

33. A processor device as in claim 1 further comprising at least orfe run-time history table. 

34. A processor device as in claim/33 comprising a vame prediction history table, whereby 
run-time behavior can be recorded for subsequent instnrction scheduling. 



35. A processor device comprisin 
recorded for subsequent instruction 



; a predicate Mstory table, whereby run-time behavior can be 
schedulir 



36. A processor device comprising k dafo hit/miss hi§t 
be recorded for subsequent instruction scheduling. 



}le, whereby run-time behavior can 



37. A processor device composing arKampiguous memory conflict history table, whereby 
run-time behavior can be recoraed for subsequent instruction scheduling. 

38. A method of providing precise interrupts in a processor implementing instruction stream 
transformation comprising the steps of 

mapping an original instruction set architecture instruction to an equivalent group of one or more 
transformed instruction set architecture instruction(s), 

using physical Registers that have not been committed to logical registers to hold results, and 

allowing thp final instruction in said group to commit the physical register result(s) to logical 
register(s^ 
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39. j\^rn^fhr^ rLELyrrr^irtgj^ in a processor imf Oem ^iiling instruction stroa gr 
transformation comprising the steps of 

assigning an instruction sequence number to each original instruction set architp<tfure instruction 
starting from the beginning of the code block, 

marking each transformed instruction set architecture instruct^ with the corresponding 
instruction sequence number, and 

committing the results of instractiems in order of ipp institution sequence numbers. 

40. A software method of performing code s^eduling comprising the step of building a 
dependency matrix. 

41. A processor device comprising means of f& 
the instruction set architecture canVxplicitlyy 
dependency vectors. 




an instruction set architecture wherein 
ote dependencies between instructions by using 



42. A processor device Comprising means of executing an instruction set architecture wherein 
the instruction set architecture can explicitly note dependencies between instructions by using 
dependency pointer 



43. A processor device comprising an instruction stream cache and means of using software 
routines to perform instruction stream transformation on code blocks. 
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